163 research outputs found

    Correlation Decay and Tractability of CSPs

    Get PDF
    The algebraic dichotomy conjecture of Bulatov, Krokhin and Jeavons yields an elegant characterization of the complexity of constraint satisfaction problems. Roughly speaking, the characterization asserts that a CSP L is tractable if and only if there exist certain non-trivial operations known as polymorphisms to combine solutions to L to create new ones. In this work, we study the dynamical system associated with repeated applications of a polymorphism to a distribution over assignments. Specifically, we exhibit a correlation decay phenomenon that makes two variables or groups of variables that are not perfectly correlated become independent after repeated applications of a polymorphism. We show that this correlation decay phenomenon can be utilized in designing algorithms for CSPs by exhibiting two applications: 1. A simple randomized algorithm to solve linear equations over a prime field, whose analysis crucially relies on correlation decay. 2. A sufficient condition for the simple linear programming relaxation for a 2-CSP to be sound (have no integrality gap) on a given instance

    Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions

    Full text link
    Learning in MDPs with highly complex state representations is currently possible due to multiple advancements in reinforcement learning algorithm design. However, this incline in complexity, and furthermore the increase in the dimensions of the observation came at the cost of volatility that can be taken advantage of via adversarial attacks (i.e. moving along worst-case directions in the observation space). To solve this policy instability problem we propose a novel method to detect the presence of these non-robust directions via local quadratic approximation of the deep neural policy loss. Our method provides a theoretical basis for the fundamental cut-off between safe observations and adversarial observations. Furthermore, our technique is computationally efficient, and does not depend on the methods used to produce the worst-case directions. We conduct extensive experiments in the Arcade Learning Environment with several different adversarial attack techniques. Most significantly, we demonstrate the effectiveness of our approach even in the setting where non-robust directions are explicitly optimized to circumvent our proposed method.Comment: Published in ICML 202

    Religious appreciation and the mundane-sacred: A neglected area of philosophy.

    Get PDF
    This dissertation belongs within the field of the philosophy of religion. The thesis proposes three basic ideas. First, there is a kind of religious language and religious experience disregarded in philosophy: the kind of religious language that is philosophically examined is called "mundane-sacred judgment;" the mental state behind that language is called "religious appreciation." Second, these phenomena are relevant to the philosophy of religion and therefore should not be ignored. Third, the philosophical model by which these two linguistic and experiential facts of religion are explicated is aesthetics. Just as metaphysics often supplies the concepts and logical problems associated with, say, the philosophical study of mystical or prayer experience, so it will be shown that the philosophy of aesthetics provides the ideas and difficulties connected with the philosophical study of mundane-sacred judgment and religious appreciation. To show this, the dissertation draws analogies between, on the one hand, "mundane-sacred judgment" and "aesthetic judgment", and, on the other hand, "religious appreciation" and "aesthetic appreciation." It also shows that, like aesthetics, the goals of the philosophical study of mundane-sacred judgment and religious appreciation are (1) to elucidate the meaning of this language and (2) to characterize its associated experience. Because the primary aim of the thesis is to suggest the existence of a neglected religious language and experience, and how they are relevant to philosophy, no single interpretation of them is proffered. Accordingly, the thesis looks at a broad constellation of philosophical ideas - ranging from ancient philosophy, to phenomenology, to analytic philosophy - and how those differing ideas might apply to this subject. Throughout, then, the reader is encouraged and challenged to consider various philosophical interpretations of mundane-sacred judgment and religious appreciation. In this way, the field of philosophical debate underlying these religious issues is delineated

    Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models

    Full text link
    With LLMs shifting their role from statistical modeling of language to serving as general-purpose AI agents, how should LLM evaluations change? Arguably, a key ability of an AI agent is to flexibly combine, as needed, the basic skills it has learned. The capability to combine skills plays an important role in (human) pedagogy and also in a paper on emergence phenomena (Arora & Goyal, 2023). This work introduces Skill-Mix, a new evaluation to measure ability to combine skills. Using a list of NN skills the evaluator repeatedly picks random subsets of kk skills and asks the LLM to produce text combining that subset of skills. Since the number of subsets grows like NkN^k, for even modest kk this evaluation will, with high probability, require the LLM to produce text significantly different from any text in the training set. The paper develops a methodology for (a) designing and administering such an evaluation, and (b) automatic grading (plus spot-checking by humans) of the results using GPT-4 as well as the open LLaMA-2 70B model. Administering a version of to popular chatbots gave results that, while generally in line with prior expectations, contained surprises. Sizeable differences exist among model capabilities that are not captured by their ranking on popular LLM leaderboards ("cramming for the leaderboard"). Furthermore, simple probability calculations indicate that GPT-4's reasonable performance on k=5k=5 is suggestive of going beyond "stochastic parrot" behavior (Bender et al., 2021), i.e., it combines skills in ways that it had not seen during training. We sketch how the methodology can lead to a Skill-Mix based eco-system of open evaluations for AI capabilities of future models

    A task and performance analysis of endoscopic submucosal dissection (ESD) surgery

    Get PDF
    BACKGROUND: ESD is an endoscopic technique for en bloc resection of gastrointestinal lesions. ESD is a widely-used in Japan and throughout Asia, but not as prevalent in Europe or the US. The procedure is technically challenging and has higher adverse events (bleeding, perforation) compared to endoscopic mucosal resection. Inadequate training platforms and lack of established training curricula have restricted its wide acceptance in the US. Thus, we aim to develop a Virtual Endoluminal Surgery Simulator (VESS) for objective ESD training and assessment. In this work, we performed task and performance analysis of ESD surgeries. METHODS: We performed a detailed colorectal ESD task analysis and identified the critical ESD steps for lesion identification, marking, injection, circumferential cutting, dissection, intraprocedural complication management, and post-procedure examination. We constructed a hierarchical task tree that elaborates the order of tasks in these steps. Furthermore, we developed quantitative ESD performance metrics. We measured task times and scores of 16 ESD surgeries performed by four different endoscopic surgeons. RESULTS: The average time of the marking, injection, and circumferential cutting phases are 203.4 (σ: 205.46), 83.5 (σ: 49.92), 908.4 s. (σ: 584.53), respectively. Cutting the submucosal layer takes most of the time of overall ESD procedure time with an average of 1394.7 s (σ: 908.43). We also performed correlation analysis (Pearson's test) among the performance scores of the tasks. There is a moderate positive correlation (R = 0.528, p = 0.0355) between marking scores and total scores, a strong positive correlation (R = 0.7879, p = 0.0003) between circumferential cutting and submucosal dissection and total scores. Similarly, we noted a strong positive correlation (R = 0.7095, p = 0.0021) between circumferential cutting and submucosal dissection and marking scores. CONCLUSIONS: We elaborated ESD tasks and developed quantitative performance metrics used in analysis of actual surgery performance. These ESD metrics will be used in future validation studies of our VESS simulator
    • …
    corecore